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Abstract 

Latent factor GARCH models are difficult to estimate using Bayesian methods 
because standard Markov chain Monte Carlo samplers produce slowly mixing and 
inefficient draws from the posterior distributions of the model parameters. This pa¬ 
per describes how to apply the particle Gibbs algorithm to estimate factor GARCH 
models efficiently. The method has two advantages over previous approaches. Eirst, 
it generalises in a straightfoward way to models with multiple factors and to various 
members of the GARCH family. Second, it scales up well as the dimension of the 
observation vector increases. 
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1 Introduction 


This paper discusses latent factor generalised autoregressive conditionally heteroscedastic 
(GARCH) models. Factor GARCH models play two roles. The first is that they are a 


convenient type of multivariate GARCH model (Bauwens et ah, 2006), in which the 


factor structure provides a direct and parsimonious way to model the effects of time- 
varying volatility in a multivariate setting. Second, they are a natural extension of the 
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latent factor model approach to time series modelling. For instance, factor models have 


a good track record in macroeconomic time series analysis (Stock and Watson, 2002), 
and the addition of GARCH errors could improve a model’s £t when the observations are 
conditionally heteroscedastic. 

The use of factor GARCH models in a Bayesian context has been hampered by their 
computational difficulty. While it is possible to use single-move MCMC methods for 
this class of models, the resulting draws only explore the posterior distribution of the 
parameter vector slowly and inefficiently ( |Fiorentini et ah' , 2004). These methods are 
also difficult to implement in models with more than one latent factor. 

The main contribution of our article is to demonstrate that the particle Gibbs sampler 
(Andrieu et ah, 2010) is particularly well suited to nonlinear latent factor models with a 


GARCH structure because we can use a fully adapted particle hlter (Pitt and Shephard 


1999) to produce rapidly mixing draws from the parameter vector. Our article shows 


that the method can handle multiple factors and can be easily be generalised to apply 
to other members of the GARCH family. The statistical structure of factor GARCH 
models means that the method scales well with the dimension of the observation vector 
because the variability in the estimated likelihood due to generating the latent factors 
decreases as the dimension of the observation vector increases. Our article also shows 
that in certain cases we can make the inference invariant to the order of the elements 
in the vector of the dependent variable, while in the general case we take care to ensure 
that the empirical results are invariant to the order. Although not shown in this article, 
it is clear that our method will accommodate regime changes and structural breaks in a 
straightforward way. 

The paper is organized as follows. Section outlines the model and the sampling 
scheme. Section demonstrates the performance of the sampling scheme using a simu¬ 
lated example. Section applies the methodology to US stock returns. 


2 Inference 

We consider a factor model given by 


Vt — f3ft + Q, (1) 

where is an A^ x 1 vector of observations, /* is a A' x 1 vector of latent factors (where 
K -C N)^ and e* is an A^ x 1 vector of idiosyncratic errors. The distributions of ft and et 
can be chosen in many ways. Our focus is on versions of ([^ in which the latent factors 
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have volatilities given by GARCH processes, so that 



( 2 ) 

(3) 


for j = 1,..., iC. In other words, represents the conditional variance of the factor. 
We represent the diagonal variance-covariance matrix as Af, where the entry on the 
diagonal is Xj^. 

The methods described here can be applied to models with many different distribu¬ 
tions over the idiosyncratic errors et- In what follows, we will assume that et iV(0,Af), 
with Af a diagonal matrix. Within this framework, we will consider two broad cases. 
The hrst is when Af is is the same unknown parameter at all time periods; in the 
second case, the idiosyncratic variances can follow GARCH processes similarly to the 
variances of the latent factors: 


Af = diag (Afi,..., Af 
Ai, = 6i + piXfi-_i + 


(4) 

(5) 


for i = 1,..., A'. Conditional on information up to period (t — 1), using either assumption 
regarding the behaviour of Af, the vector of latent factors is distributed as ft iV(0,Af), 
and the observation vector yt A^(0,/3AfAff' -1- Af). We show below that this condi¬ 


tionally Gaussian structure makes sequential Monte Carlo inference particularly efficient 


because a fully adapted particle hlter can be applied. 

The free parameters of the model are the elements of /?, the factor GARCH parameters 
{'jj, aj, 9j) for j = 1,..., A', and either the hxed values of Af or the GARCH pararam- 
eters for the idiosyncratic errors indexed by i = 1,..., A^. We can carry out 

inference on these parameters, and generate forecasts of future observations, using the 
Gibbs sampler. The following sections briefly outline how an efficient Gibbs sampler can 
be applied to the model in Q. 

Note that the estimation method described below can be generalised straightforwardly 
to any variant of the model which is conditionally linear and Gaussian (that is, conditional 
on the latent state at time t — 1). For instance, we use a GARCH-in-mean (GARCH-M) 
model for the observation vector in the empirical application described in detail below. 
In general, the methods described here can be applied to many members of the GARCH 
family. It is also straightforward to include exogenous observed variables on the right- 
hand side of ([^, though we omit this option throughout the paper for clarity. 


3 


2.1 The latent factors 

Conditional on the parameters of the model, we can draw of the latent factors fi^T using 


a fully adapted variant of particle Gibbs ( 

Andrieu et ah 

2010 

). To improve the efficiency 

of the Gibbs draws, we implement ancestor sampling ( 

Lindsten et ah 

2012 

), as described 


below. In addition to conditioning on the parameters, the particle Gibbs algorithm also 
uses a draw of fi-T from a previous iteration. The sampler is initialised by setting this 
previous draw to an arbitrary value. We begin the particle Gibbs algorithm with M 
copies of the factor variance in the hrst period, Af, initialised to its unconditional value. 


diag 


( __V 

V1 / 


Additionally, the hrst particle takes the value of fi from the previous draw fi-T that 
we condition on, and the remaining (M — 1) particles get a draw of /i iV(0,Af). 
Gonditional on the hrst observation yi, we resample the particles in proportion to their 
likelihood 


P{yi\fi) =oc exp ■ 


We then choose a particle index bi using an ancestor sampling step, described below. 
For the following periods f G {2,..., T}, we carry out the following steps: 

1. Galculate the one-step prediction weights = p{yt\ft-i, Af) for fc G {1,..., M}, 


(k) 

= 


(2vr) 




exp 




where 


= ((Af )-i - (Af)-‘,3(/f'‘>)-'/3'(Af)-') 

(//<‘l)-' = /J'(Afr'/3+(Afr' 

and the values of Af and Af are conditional on the value of the particle indexed 
by k. 

2. Resample the particles 1,...,M with probability oj^f_i /but keep the 
particle indexed by bt unchanged. 

3. Draw a value of / ~ p{ft\yt, ft-i, Af) for each particle k E {1,..., M}. This density 
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can be calculated as where 

= (//(t))^'Afj,„ 

Keep the particle indexed by bt unchanged. 

4. Update the values of and for each particle, using equations ^ and 

5. Carry out an ancestor sampling step by calculating the backward weights given 
by 

T 

= Yl P{ys\fs,K)p{fs\^s), (6) 

S=t-^1 

where we condition on the particular draw and associated A^^ for the hrst 
period, then on the given path f(t+i):T thereafter. Following Lindsten et ah (2012), 
we truncate the product in ([^ after a fixed number periods. (In the examples below, 
we truncate after hve periods, since this provided satisfactory approximations to 
the exact values.) As a result, the computing time increases linearly with T, rather 
than quadratically. 

6. Choose a particle index bt oc Wt\T-, and set ft = ft^^"'- 


Lindsten et ah 


(2012 


2.2 The factor loadings 


Conditional on a draw of /i:t, we can take a draw of the factor loadings (3. Broadly, there 
are two approaches that can both be used at this point. The first approach, widely used 
in Bayesian inference, involves imposing restrictions on the structure of (3 to guarantee 
identihcation. A well-known aspect of factor models such as ([^ is that we cannot directly 
identify a single set of values for the latent factor series fi^T, but only an equivalence class 
under the action of orthogonal rotations. In other words, if Q is any K x K orthogonal 
matrix, then a vector of factors ft with loadings (3 will have the same likelihood as another 
vector ft = Qft with loadings (3 = f3Q'. In order to pin down particular values for the 
latent factors, we therefore need extra identifying assumptions. Two common choices in 
the Bayesian econometric literature are, hrst, to let /? be a triangular matrix and assume 
that / has unit variance (Geweke and Zhou, 1996); or, second, to have an unrestricted 


variance for / and assume that f3 is triangular with ones on the diagonal (Aguilar and 


West, 2000). Using one of these assumptions, and given a draw of fi:T, the posterior 


distribution of f3 is conditionally normal. Taking a conditional draw of f3 effectively 
means carrying out a linear regression. Thus it is straightforward to apply one of these 
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identification schemes in the Gibbs sampler, and we use one of them for our GARCH-M 
example below. 

This type of identihcation scheme is simple, generally applicable, and widely used. 
Its main drawback is that the resulting inference can depend on the order in which the 


components of yt happen to be arranged (Ghan et ah, 2014). The reason for this is that 


the triangular identihcation schemes introduce a discontinuity into the mapping from the 
reduced-form values 13ft to the factor loadings (3. One can understand this intuitively by 
considering arrangements in which the observation is in fact uncorrelated with the 
factor. The triangular identihcation schemes ehectively place a prior weight of zero on 
those cases. Thus their assessment of the model’s ht to the data can be severely hampered. 
Instead, it is possible to use a sampling method that is invariant to reorderings of the 


observation vector (Ghan et ah, 2014), which we summarise here. The disadvantage 


of this method is that at present it may not be as widely applicable; its assumptions 
are invalidated if the latent factors have a GARGH-M structure instead of GARGH, for 
instance. We must also assume in this case that the idiosyncratic errors have constant 
variance, Af = A^. These restrictions may or may not be important, depending on the 
application in question. We use this method for our simulated examples in Section 
where we assume constant error variances, as well as using it to provide a robustness 
check in the more general model of Section 

Briefly, the invariant method for inference on (3 hrst stacks the row vectors // into a 
matrix F. The matrix F(3', which has rank K, can be decomposed as 

FI3' = f/A, 


where U G Vk,t and A G Here Vk,t is a Stiefel manifold. A Stiefel manifold is a 

space consisting of orthogonal A'-frames in the ambient space (James, 1954 Strachan 


and Inder, 2004). The prior for A is chosen to be matrix-normal, so that each element 
of A is independently normally distributed with mean zero and variance 1 /(ca)^. The 
matrix U functions as the coordinates of the row space sp{F) of the reduced-rank matrix 
F/3', seen as an element of Gk,t, the Grassmannian Gk,t being the space of all linear 


A'-dimensional subspaces embedded in (James, 1954 Strachan and Inder, 2004). The 


decomposition of Ff3' into t/A is not convenient enough to work with, because U has 
a hxed orientation within the plane sp{F) as the plane moves through Gk,t- However, 
suppose we act on it with an orthogonal matrix G G 0{K) (writing 0{K) for the space 
of orthogonal matrices), so that 


UA = {UG) (C"A) = UaA^ 


6 























Then the matrix will have the same prior as A, by the rotational invariance of the 
normal distribution; and if U and C have uniform priors on Gk,t and 0{K), then their 
product will have a uniform prior on Vk,t, given by U'dU = 1/(cgCo), where cg and cq 


are the normalising constants for Gk,t and 0{K) (James, 1954). 


Finally, the Chan et ah (2014) procedure introduces parameter expansion (Liu and 


Wu, [1999 ) to turn the prior distribution into a convenient conjugate form. Let A ~ 
W{Ir,T — N), and let k be its Cholesky decomposition, so that A = kk'. Here, hF(A, u) 
is a Wishart distribution with scale matrix A and u degrees of freedom. Rewrite the 
reduced-form matrix as 

UaAa = {UaK){K-^Aa) = F (3'. 

The Jacobian for this transformation is 


{dA){U'JUa){dA^) = 2^\F'F 


,-{T-N-K-l) 


{dA){dF) 


. Based on the assumptions made so far, the prior is 


p{Ua,Aa,A) = 


1 

(CX\ 

_CGCo 

\2ti) 


' exp (^-ytrAX) 


— (-Aa ) (dA) 

cw V 2 


with Cw representing the normalising constant for the Wishart distribution. From this, 
it follows that 

p(/?,F) (X (^_|tr/3F'F/?') exp (-^trF'F^ (d/?)(dF) 


Therefore, conditional on F, the prior on f3 is Gaussian. Thus, the parameter expan¬ 
sion introduced by Chan et ^ (2014) produces a conjugate prior for the regression of [3 on 
F, meaning that the conditional posterior p{/3\F ,...) is Gaussian. Note that, while the 


original analysis by Chan et al. (2014) uses homoscedastic latent factors, that assumption 


seems not to be required for this derivation to go through; in particular, it still applies 
when the rows of F have time-varying volatility governed by a GARCH process. Writing 
/3i for the row of the loading matrix, and Pi for the T x 1 vector of observations on 
the data series, we have 


A'^.cx) ~ Af ( [(1 + c,h^)F'FY" F'vt , 

Af [(l + cxAf)F'F]- 


( 7 ) 
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Although the Chan et al. (2014) method can be used to generate draws of jS, which then 
provide conditional draws of F, it does not circumvent the original problem of separately 
identifying the rotation of I3F. In other words, the draws of (3 and F generated by 
the Gibbs sampler are implicitly providing draws of the reduced-rank matrix fiF. If an 
econometrician wishes to interpret the latent factors F or the loadings f3 separately, they 
must still impose some kind of identification scheme, such as the diagonal structures of 


Geweke and Zhou 

(l')'Mi 

) or 

Aguilar and West 

(2000 

). The advantage of the 

Chan et al. 


(2014) approach is that the estimates of (3F, and the resulting judgements about the 
accuracy of the model, do not depend on the order of the components of y. That would 
not be the case if we imposed the identification scheme during the estimation. 


2.3 The idiosyncratic errors 

If the idiosyncratic error variances A® are homoscedastic, then it is convenient to impose 
independent inverse Gamma prior distributions with mean /ie and degrees of freedom Ve- 
Consequently, their posterior distributions are inverse Gamma with (T -|- degrees of 
freedom and mean + z/g/ ye) ■ 

If the idiosyncratic error variances are assumed to follow GARCH processes, then we 
can carry out inference on their parameters in the same manner as described in the next 
subsection. 


2.4 The GARCH parameters 

Having obtained draws of fi.,T and j3, we can use a Metropolis-within-Gibbs step to obtain 
draws of the GARCH parameters. To increase the efficiency of the Metropolis Hastings 
moves, we first reparameterise the set of GARCH parameters for factor number j by 

+ dj ^2 = ^3 = ^ ( 8 ) 

1-^1 -01 

and then by 

(pi = log(^/>i) - log(l(p3 = log(V’ 2 ) 993 = log(^/’3) - log(l - (9) 

Writing ip'^ = {ipi, ip 2 , (fs), we use these coordinates to propose new parameters via ip^ ~ 
N{ip'^, S). The proposal covariance S is initialised to a matrix with small positive numbers 
on the diagonal, and then updated using the adaptive Metropolis Hastings scheme of 
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Haario et al. (2001). The new parameters are accepted with probability 


Pj = l A 

The conditional likelihood 7 r(-) is given by 






nat 


f2 

'2A, 


where the variance follows 

At+i = + ajff + OjXt 

. Here the proposed GARCH parameters are calculated from ipP by inverting ([^ and ([^. 

When using the reordering-invariant specification for the factor loadings, the marginal 
variance of the latent factors becomes unidentihed. In our implementation, we £x the 
marginal variance at 1, by assuming that 7 ^ = (1 — oij — 6k)- This restriction is straight¬ 
forward to impose in the Metropolis Hastings steps. 


2.5 The number of latent factors 


Inference on the number of factors K can be carried out using the Reversible Jump 


method (Green, 1995). The Reversible Jump implementation described by Lopes and 


West (2004) for the homoscedastic factor case is straightforward to extend to models in 


which the latent factors and idiosyncratic variances follow GARGH processes. In this 
section, we summarise and explain this extension of the Reversible Jump method. We 
note in passing that models with homoscedastic latent factors can use the Savage-Dickey 


Density Ratio (SDDR) (Verdinelli and Wasserman, 1995), as described by Ghan et al. 
(2014). However, after applying that alternative method to simulated examples, we 


concluded that it is not feasible for GARGH factor models, since the changing volatilities 
of the latent factors make the estimation of the SDDR inefficient and slow to converge. 


In implementing the Reversible Jump method in the style of Lopes and West (2004), 


we begin with separate preliminary estimation runs for all values of JL up to a chosen 
maximum K. The draws of the factor loadings and the GARGH parameters from these 
preliminary runs are then used to generate proposal draws for the model-choice steps, as 
follows. Let (3^ and Bk denote the posterior mean and covariance of the factor loadings 
estimated from the preliminary estimation run with k latent factors. The proposal density 
for /9, conditional on using k latent factors, is then qk{(3) = J\r{(3i^,hBk), where 6 > 0 is 
a scale factor chosen to ensure that the tails of the proposal density are fat enough. 
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Similarly, let the estimated posterior mean of the GARCH parameters, transformed as in 
equation (|^, be denoted and the estimated posterior covariance be <hfc. The proposal 
density for the transformed GARCH parameters is then qk{^) = c^k), with c > 0 

a hxed scale parameter. Note that using the transformed values ((pi, (p 2 , rather 
than the original parameters ( 7 , a, 6), ensures that the proposed GARCH parameters are 
always positive and within the stable region. The proposal density for the idiosyncratic 
error variance parameters, which we will write as qk{o'^), can either be an inverse gamma 
in the homoscedastic case, or a Gaussian on the GARCH parameters transformed as in 
equation (j^. 

Writing 6k for the entire parameter vector used in a model with k latent factors, the 
proposal density is 

qk{,6k) = qk{(i)qk{q^)qk{,cr^) ( 10 ) 

At the end of each Gibbs sampler iteration, we can then generate a between-model 
move as follows. We sample a proposed value k' uniformly from {1,..., if}. We then 
use the fully-adapted particle hlter to obtain a simulated value of the proposed likelihood 
vhjW, 6k') and the current likelihood p( 2 /|A;, 6k)- This allows us to compute the Reversible 
Jump acceptance probability 


viyW ,6k')v{6k'\k')p{k')qk{6k) 
p{y\k, 6k)p{6k\k)p{k)qk'{6k') 


Here p{6k\k) is the prior density on the parameter vector 6k-, and p{k) is the prior mass 
on the number of latent factors. With probability a, the proposed value k' is accepted, 
and we carry out the next Gibbs iteration using the proposed parameters 6ki- In that 
case, the current draw of the latent factors fi-,T can be padded with zeros if k' > fc, or 
truncated if k’ < k. 


3 Simulated examples 

To evaluate the performance of the estimation method described above, we carried out 
a series of tests on simulated data. We simulated 200 periods of data using a model 
with two latent factors. For both factors’ GARCH parameters, we chose aj = 0.04, 
6j = 0.9, and (as discussed above) ■jj = (1 — aj — 6j)- Unless otherwise specihed, we 
used N = 5 observation components, M = 10 particles, and idiosyncratic variances 
A'® = diag(0.02,..., 0.02). We carried out hve replications of each experiment. That is, 
we simulated hve different data sets using each combination of settings, then estimated 
the model on each one. For each replication, we obtained 20,000 parameter draws. In our 
estimation, we constrained the draws of jj to be equal to (1 — — 6j) as discussed above. 
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(a) Trace plot; black line shows (b) Estimated autocorrelations 
the true value 

Figure 1: Analysis of draws of (/3F)i^5from one run of the simulated example 


The diagonal values of A® (which are in fact the same) were estimated independently. 

We evaluated the efficiency of the estimation method by calculating the integrated 
autocorrelation times (lACTs) for particular groups of parameters. Given a vector of 
draws 9i, the lACT is dehned as 

OO 

IACT{e) = 1 + 2 E Pr{e), (12) 

T=1 


where Prid) is the autocorrelation between 9i and 9^^^-. We calculated the lACTs using the 
overlapping batch means method of Flegal and Jones (2010). The times can be intuitively 
interpreted as inflation factors relative to independent draws from a parameter’s marginal 
posterior distribution. That is, a value of 20 would suggest that we require 20 draws from 
the algorithm to obtain the equivalent of one independent draw from the posterior. 


3.1 Results 

The sampler produced parameter draws with a low degree of autocorrelation. The left 
panel of Figure [^shows a typical trace plot of the draws of f3f —in this case, corresponding 
to the hrst observational component in the hfth time period. It appears that the sampler 
is exploring the posterior distribution quite rapidly. This impression is supported by the 
estimated autocorrelations of those draws, presented in the right-hand panel of the hgure. 

We carried out the Reversible Jump procedure to estimate the number of latent fac¬ 
tors. It selected two latent factors (the correct number) with a high probability. 
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M 

(/^/) 

A^ 

a 

e 

5 

1.73 

6.4 

37.8 

55.0 

10 

1.30 

3.2 

26.4 

19.4 

20 

1.17 

2.9 

29.1 

17.4 

40 

1.08 

2.2 

23.6 

15.3 


Table 1: Median integrated autocorrelation times for parameter groups, using different 
numbers of particles M 


M 

iM) 

A® 

a 

e 

5 

97.11 

74.9 

69.9 

94.7 

10 

79.70 

52.2 

63.4 

95.9 

20 

76.35 

36.1 

68.3 

93.1 

40 

4.26 

8.0 

42.5 

30.8 


Table 2: Highest integrated autocorrelation times for parameter groups, using different 
numbers of particles M 


3.2 The number of particles 

We tried varying the number of particles M to ascertain its effect on the efficiency of 
the estimation. Table [T] reports the median integrated autocorrelation times for different 
groups of the model parameters over the hve replications. Table reports the maximum 
lACTs for each parameter group. The results suggest that the worst-case performance is 
fairly good, even for small numbers of particles. 

The estimated median autocorrelation times are relatively low for all groups of param¬ 
eters, but particularly for the reduced-form factor values /?/ and the idiosyncratic noise 
variances A®. The values for f3f in Table 0 are consistent with the asymptotic theory 
developed by Pitt et ah (2012) for particle Metropolis Hastings: the lACTs decrease in 
proportion to exp(l/M). However, the levels of the lACTs are considerably lower than 
might be expected from a Metropolis Hastings estimation run. 

Since the time required for running the particle Gibbs algorithm scales roughly in 
proportion to M, we can use the product of M and the lACT as a measure of computing 


M 

(/S/) 

A^ 

a 

e 

5 

8.66 

32.1 

189.0 

274.8 

10 

13.03 

32.1 

263.6 

193.7 

20 

23.45 

58.8 

582.1 

348.3 

40 

43.21 

86.7 

944.6 

613.7 


Table 3: Median computing times, calculated as M x I ACT 
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N 

m 

jyE 

a 

e 

5 

1.29 

4.8 

44.0 

13.6 

10 

1.30 

3.2 

26.4 

19.4 

25 

1.29 

2.1 

65.6 

69.8 

50 

1.32 

2.4 

60.8 

61.1 

100 

1.53 

5.2 

25.0 

18.2 


Table 4: Median integrated autocorrelation times for different parameters with various 
sizes of the observation vector 


r 

Wf) 

A^ 

a 

e 

0.10 

6.15 

44.7 

27.6 

39.1 

0.02 

1.30 

3.2 

26.4 

19.4 

0.01 

1.28 

3.3 

40.9 

31.5 


Table 5: Median integrated autocorrelation times for different parameters with various 
idiosyncratic noise variances A® 


time. These values, reported in Table suggest that around 5 to 10 particles is optimal 
for this model. 

3.3 The number of observations 

We varied the number of observations N from 5 up to 50. Table summarises the 
results. The inference on the latent factors (our main object of inference) remains broadly 
similar, though the efficiency of inference about the idiosyncratic errors improves as 
the number of observations increases. The efficiency of the estimates of the GARCH 
parameters becomes somewhat poorer for medium-sized N, partly reflecting the difficulty 
of separately identifying the a and 6 parameters. However, the efficiency then improves 
for N = 100 as the increase in N helps to reduce the noise in the estimated likelihood 
because of the GARCH structure. 

3.4 The idiosyncratic noise variance 

We also varied A^, the variance of the idiosyncratic noise vector e. That is, the values 
on the diagonal of the matrix were set to the same value on each repetition, but we 
made that common value higher and lower, using the values listed in Table 

Unsurprisingly, the performance of the sampler degrades somewhat as A® increases, 
meaning that the observations become less informative about the latent factors. 
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4 Empirical application to US stock returns 


We applied the estimation method described above to a sample of monthly US stock 
returns. Our data, kindly provided by Kenneth French, consisted of the monthly returns 
for 17 value-weighted industry portfolios. We used a sample running from January 1980 
to December 2012, a total of 396 observations. 

In this case, we used a GARCH-M model for the volatility of the latent factors. This 
is equivalent to positing a leverage effect—that is, an interaction between the factors’ 
volatilities and their conditional means. The model is summarised by 


Si = Pfi + it, 


(13) 


where now the latent factors ft have conditional means depending on their volatility 




(14) 


Inference on this model can be carried out through a straightforward generalisation of 
([^, since it maintains the conditionally linear and Gaussian structure of the basic factor 
GARGH model. The conditional variances Af and Af are assumed to have GARGH 
structures, as described in section 

This model invalidates the assumptions of the invariant factor loading estimation 
method, so we chose instead to identify the factor loadings by assuming that the loading 
matrix (3 has ones on the main diagonal and zeros above the main diagonal. We imposed 
independent A^(0,1) priors on the rest of the elements of /3. For the leverage parameters, 
we assumed independent priors given by Tj ~ A^(0,1), and each of the GARGH parameters 
7 i, Qfj, 6i, 6i, Pi and (fi were given independent U[0,1] priors. 


Given the difficulties identihed by Ghan et ah (2014), the ordering of the components 


of the observation vector should be chosen with some care. Inference becomes unstable 
if the observation component is not, in fact, correlated with the latent factor. In 
the case of the dataset used here, we have no prior information about industry groups 
that would make one ordering seem more reasonable than another. We therefore chose an 
ordering in two stages, starting by roughly ranking the series in order of the explanatory 
power, and then conhrming that the resulting order was not obviously inconsistent with 
an invariant specihcation. 

For the hrst stage, we carried out a linear regression of each of the 17 industry return 
series on every other series, selecting the series that provided the highest single as our 
hrst observation component. We chose the second and subsequent series by recursively 
adding them to the set of independent variables in the linear regressions, each time 
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choosing the series that provided the highest single and stopping after choosing the 
hrst eight components of the observation vector. 

Having established this ordering, we checked it against the results of a factor GARCH 
model that we estimated with reordering-invariant loadings, as described above. (Note 
that this model assumes homoscedastic idiosyncratic errors and no GARCH-in-mean ef¬ 
fect, so it is only an approximation of our hnal model specification; this check is indicative 
rather than conclusive.) Gonsider a single draw f3 of the factor loadings from the invariant 
specihcation of the model. We can relate it to the specihcation with ones on the main 
diagonal by using a QR decomposition of /3 to write 





E 

Z 


DQ = (5DQ. 


(15) 


where Q is orthogonal, L is lower triangular, E is lower triangular with ones on the 
diagonal, and D is diagonal and positive. So, if the corresponding draw of the latent 
factors is /, then the estimate of / produced by the specihcation with ones on the diagonal 
can be calculated as Q~^D~^ f. This will be numerically unstable if the numbers on the 
diagonal of D are too small. We therefore took 1000 draws using the invariant algorithm 
and the proposed ordering of variables, with the number of latent factors K ranging from 
1 to 8. The smallest element of D was found to be 6 x 10“^, which occurred in the 
eight factor case. While far from ideal, this is well within hoating-point precision. This 
suggests that our variable ordering is adequate. 

We estimated this model on the monthly equity return data using the Reversible 
Jump method described above to choose the number of latent factors. In the preliminary 
estimation runs, we estimated models with between one and eight latent factors. 


4.1 Results 

The Reversible Jump method placed a high probability on the version of the model with 
seven latent factors. In Table we report the estimated signal to noise ratios for each of 
the 17 observation components. The table reports the sample variance of each component 
(with the monthly returns measured in percentage points), and its estimated idiosyncratic 
variance. The latter was calculated as the mean of the unconditional noise variance Af, 
given by J*/(1 — pi — 0i). The hnal column reports the signal to noise ratio, calculated as 
the difference between the sample variance and the idiosyncratic variance, divided by the 
idiosyncratic variance. Most components are estimated to have fairly high SNRs. This 
means that the fully adapted hlter for the GARGH components will be very efficient in 
this case. 
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Industry sector 

Sample 

variance 

(%) 

Idiosyncratic 

variance 

(%) 

SNR 

Food 

18.5 

1.2 

14.6 

Mining and Minerals 

65.9 

29.1 

1.3 

Oil and Petroleum Products 

33.1 

4.2 

6.8 

Textiles, Apparel & Footwear 

39.1 

15.0 

1.6 

Consumer Durables 

33.0 

7.7 

3.3 

Chemicals 

34.9 

7.2 

3.8 

Drugs, Soap, Perfumes, Tobacco 

20.4 

14.4 

0.4 

Construction & Construction Materials 

37.8 

11.5 

2.3 

Steel Works Etc 

66.6 

49.2 

0.4 

Fabricated Products 

31.4 

6.8 

3.6 

Machinery & Business Equipment 

50.4 

24.5 

1.1 

Automobiles 

46.2 

6.5 

6.1 

Transportation 

30.6 

7.3 

3.2 

Utilities 

15.8 

3.7 

3.3 

Retail Stores 

28.0 

7.9 

2.5 

Banks & Insurance Companies 

31.3 

2.6 

10.8 

Other 

26.1 

6.9 

2.8 


Table 6: Estimated variances for US industry sectors. 


We find no evidence for leverage effects in this dataset, with the posterior credible 
intervals of each tj including zero in all model variants. This stands in contrast to the 
results on UK stock return data analysed in Fiorentini et al. (2004), which used a single 
latent factor with homoscedastic idiosyncratic errors. It may be that putative leverage 
effects can appear as artefacts of time-varying idiosyncratic volatility. 

The Gibbs sampler produced these results efficiently. The draws of / show a partic¬ 
ularly rapid degree of mixing; Figure]^ shows an example trace plot. The low autocorre¬ 
lation of the draws is echoed in their low lACT, estimated to be 1.7. The median lACT 
for all components of / was 2.8, and the maximum was 26. The draws of f3 are a little 
slower mixing than in the simulated examples considered above, with a typical example 
plotted in Figure]^ The draws of the factor loadings still mix relatively rapidly, with a 
median lACT of 16 and a maximum of 37. 


5 Conclusion 

Recent developments in sequential Monte Carlo methods have opened up new possibilities 
for Bayesian computation on latent factor models with time-varying volatility. As our 
article demonstrates, the particle Gibbs algorithm provides a flexible and efficient frame- 
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(a) Trace plot (b) Estimated autocorrelations 

Figure 2: Analysis of draws of /i 5 o,i from the estimated GARCH-M model with four 
latent factors 



(a) Trace plot (b) Estimated autocorrelations 

Figure 3: Analysis of draws of from the estimated GARCH-M model with four latent 
factors 
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work for carrying out inference on latent factor models with GARCH factors and GARCH 
errors. It can be applied to models using an invariant specification for the factor loadings 
(where possible), or to those using a more traditional triangular identification scheme. 
The conditionally linear-Gaussian structure of GARGH makes it particularly well suited 
to particle methods. The resulting parameter estimates mix well and explore the poste¬ 
rior distribution rapidly. It is possible to extend the methodology in a straightforward 
way to GARGH factor models with regime changes and structural breaks. 
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